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A method and a device for source cod inn 



FIELD OF THE INVENTION 



The present invention relates generally to source coding of data. In particular die 
invention concerns predictive speech coding methods that represent speech signal via a 
speech synthesis filter and an excitation signal thereof. 

10 BACKOROOND OF TrMmVENTJON 

Modern wireless communication systems such as GSM (Global System for mobile 
communications) and UMTS (Universal MoMe Tele^ System) transfer 

various types of data over the air interface between the network elements such as a 

15 base station- and a mobile terminal. As the general demand for transfer capacity- 
continuously rises due to e:g. new multimedia services coming available, new more 
efficient techniques have to be developed respectively for data compression as radio 
frequencies can nowadays be considered as scarce resources; Data compression, is 
traditionally also used for reducing storage space requirements in computer data 

10 systems, for example. Likewise, different mett^ 

coding have been developed during the last; few decades. 

Data is usually compressed (-compacted) by utilizing a so-called encoder to be 
subsequently regenerated witfca decoder for later exploitation whenever needed. Data 
coding techniques may be classified according to a number of different approaches. 
One is based ou me co^ 

the source data but any information is actually uot lo^ during the encoding process, i.e. 
after decoding the data* matches p^ data, meanwhile a lossy- 

coder produces a compacted presentation of the source data the decoding result of 
which does not completely correspond to the original presentation anymore. However, 
a data loss is not a problem in situations wherein the user £>f the data cannot el ther 
distinguish the differences between the original and once compacted data, or the 
differences^ hot, at least, cause severe difficulties or objection in exploiting slightly- 
degraded data. As human senses including hearing and vision are somewhat limited, 
it's, for example, possible to extract unnecessary details from pictures, video or audio 
signals without considerably disturbing the final sensation effect. Often source coders 
produce fixed-rate output meaning the compaction ratio does not depend on the input 
data. Alternatively, a variable-rate coder takes statisti 
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white analysing it thus out^utting compacted data wth variable rate. Vaiiabfefrate 
coding surely has certain benefits over fixed-rate models. Considering e.g. the field of 
speech coding a variable-rate eodec; (coder-decoder) can maximise the capacity and 
minimize the average bit-rate for given speech quality. This originates from the non- 
5 stationarity (or quasirstationarity) of a typical human speech signal* a single speech 
segment; a$ the coders process a certain period of speech at a time - may comprise 
either very^ signal (e.g. periodically r^etitiye vpiced sound) or strongly 

fluctuatirisg signal (transitions etc) thus direcrt^ ^ount of Ipits 

required for silfficient represent^^ of the segment under analysis In addition, 

10 conddenng espedally mobile networks acMeyed §^ings in source coding TOay be used; 
for enhancing; e.g; channel coding thus resulting a better tolerance- -against interference 
on the mdiop^ rate that is 

low enough to save transmission capacity but high enough to code dijfficult segment, 
wit adequate quality, the compromise rate obviously being unn^ssaiy high for 

15 '"eaSi^ 5 "^ 

Still, as the nature and targeted use of the source data defines ori e^e-hy^cas^^ 
optimum means for it; : an idea of a generic optimum coder directly 

applicable for any possible: scenario is utopistic; develops of source coding has 
20 been diverged into many directions utiiizing the data statistics and imperfections of 
hitman senses into ma^xnum account in a specialized manner. 

In case of mobile ne^^rarks; a speech coder is deiMtely one of the most crucial 
dements in pr^iding: the c^letfca^^ experience in addition to 

25 various voice storage and voice message s^pipes; Moctera speech cMers have a 
common starting point: compact representation of digitised speech while preserving 
speech quality traly ^ si^e<^>^ measure concern^ speech intelligibility and 
naturalness although sometimes also -objectively*' measured by mtiii^ing weighted 
distortion measures^ but the techniques used in modeling greatly vary. One speeeh- 

30 coding model heavily utilized today is palled CELP (Code Excited Linear Prediction), 
CELP coders like GSM EFR (Enhanced Full Rate), TO adaptive multi-r^e <^d^ 
AMR and TETRA ACELP (Algrebraic Code Excited Lmear Prediction) belong the 
group of AbS (Analysis by Synthesis) coders and produce the speech parameter by 
modeling the speech signal via minimizing an error between the original and 

35 synthesized speech in a loop. CELP coders carry features from both waveform 
(common PGM etc) and vocoder techniques. 



WO 2005/034090 



PCT/FI 2004/000579 



3 

Vocoders are parametric coders that exploit, for example, a source-filter approach in 
speech parameterisation. The source models the signal originated by air-flow emitting 
from the lungs to glottis either through vibrating (resulting voiced sounds) or stiff 
(resulting unvoiced sounds with turbulence originated from different shapes within the 
5 vocal tract) vocal cords up to the oral cavities (mouth, throat) to be finally radiated out 
through the lips. 

Figure 1 discloses a generic sketch of a simplified human speech production model, 
called an LP (Linear Predictive) model that is utilized in many cpntemporar/ speech 

1 0 coding methods like CELP. The process is called linear prediction since current output 
S(n) is determined by a weighted sum of previous output values and an input value 
generated by pulse source 1()2 or noise spurce 104 depending on the nature of speech, 
roughly being divided to either voiced in the first and unvoiced in latter case. Pulse 
source 102, ^emitting the impulse train imitates the vibration at the glottis with a 

15 coixesporidmg tundamefltal fregupcy called a pitch frequent with a certain pitch 
period. Source type may be altered during the synthesis process via switch 106. Before 
filtering the excitation source signal with all-pole ITR (Infinite Impulse Response) filter 
110 modeling the vocal tract it is multiplied by a proper gain factor in multiplier 108. 
Therefore, speech synthesis can be performed by first defining the class of current 

20 speech segment under consideration as either voiced or unvoiced, and then by driving 
the excitation signal of the selected type through a multiplier and a synthesis filter. 
More about LP and speech modeling or coding in general can be found in reference 

25 A typical CEUP coder, presented in figure 2, and a corresponding decoder, presented in 
figure 3, comprises several filters for modeling speech generation, namely at least a 
short-term filter such as an LP(C) synthesis filter used for modeling the spectral 
envelope (formants; res and a long-term filter the 

purpose of which is 1© model the oscillation of the vocal cords inducing periodicity in 

30 the voiced excitation signal comprising impulses separated by the current pitch period 
called a lag. The modeling is substantially a single speech segment, called a 

frame hereinafter, at a time. As can be noticed from figpre 3, the decoder structure 
reminds of the common LP synthesis model with an additional LTP (Long-Term 
Prediction) filter. The excitation signal is created on the basis of an excitation vector 

35 for the respective block. For example, in ACELP coders the excitation consists of a 
fixed number of non-zero pulses the position and amplitude of which is selected by 
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utilizing a search in which a perceptually weighted error term between the original and 
synthesized speech frame is minimized. 

Considering; CELP ^odiiig.rad ^TOpdingiiii .mpjie detail a preview of codec internals 
,5 is presented herein. The encoder includes short-term analysis function 204 to form a 
set of direct form filter coefficients called LP parameters a(i), where i=l,2,...,m (in 
thus defining the order of the analysis), for example. Parameters a(i) are calculated 
once for a speech frame of N samples, N corresponding e.g: a time period of 20 
milliseconds. As spe^ckhas a quasi-stationary nature meaning^ it may be corisid^ as 

10 stationary if the inspection period is short: enough (<=20ms), optimum: filter 
coefficients can be calculated for a single frame by utilizing standard mathematic 
means such as Wiener filter theory, which requires signal stationarity, on frame-by- 
frame basis. Resulting equation with computationally e^iaustive matrix inversion may 
then be effectively calculated by exploiting e.g. so-called autocorrelation method and 

15 Ley^Qtt-Purbm recursion. See reference [2] for further information, LP parameters 
a(i) are exploited in searching the lag Value :matchmg best wi& 

analysis, in calculating a so-called LP residual by filtering the speech with LPC 
analysis for "inverse?') |ilter, beiitg the inverse Afz) of LPCsynraesis filter 1/A(z), and 
naturally as coefficients of LPC synthesis filter 210 while creating a synthesized 
20 speech signal ss(n). The lag value is calculated in LTP analysis block 202 and used by 
LTP synmesis filter 208. The long^temi predictor and corresponding sy^itliesis filter 
208 being the mversion therepT^^ 

The tap may optionally have a ^^ ; hc^-^^Ss^mti>^m^ defining: the ^to^l gain of 
the one tap LTP filter). LP^:pMameteff *ute ailso utilized; in the excitation codebook 
25 search as described below. 

In a basic CELP coder, after : ^dtis^iaT&p^ ^f^£ibiE%egr?ljp^i- -v^alues. T and LP parameters afi), 
iteration for a perfect excitation codebook vector according , to the selected error 
criteria is; started; In some advanced coding . models it's possible to fine-tune the lag 

30 value or even LP parameters while searching a perfect excitation vector. During an 
iteration, round, excitation vector c(n) is selected from codebook 206, filtered through 
i^\^j^^f^ii^^;j^S^l^ n ^ and the resulting synthesized speech ss(n) is 
finally compared 218 with the Original speech signal s(n) in order to detenmne the 
difference, error e(n). Weighting filter 2 12 that is based on the characteristics of human 

3 5 hearing is used to weight error signal e(n) in order to attenuate frequencies at which the 
error is less important according to the auditory perception, and. to correspondingly 
amplify frequencies that matter more. For example, errors in the areas of "formant 
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valleys" may be emphasized as the errors in the synthesized speech am mot so audible 
in the formaiit frequencies due to the auditory masking effect Codebopk search 
eontroller 214 is used to i$^tff m j^^^L of;;.^e :; Go4e vector in eodebook-206 aceor^ 
to the weighted error term acquired from weighting filter 212. Consequently, i n d ex u 
indicating a certain excitation vector leading to a mimnxum possible weighted error is 
eventually seleetedj Gont^l^ that ib mttMpli^d 2 16 

with the code vector under analysis before I^P and LPG synthesis filtering. After a 
frame has been analysed, parameter LTB parameter like T 

■and : pptio:nally .also .gain. ,g2 : , : c^djebbok-: vector; index u or other identifier ' &erepj| , 
codebook scaling factor g) are sent over timsmission chaatnel (air interfece^ fixed 
t^nsfer medi^ 

codebook 306 corresponds to the one in the encoder 
used for generating excitation signal c(n) on the basis of received codebook index u. 
Excitation signal c(n) is then multiplied 312 with scaling factor g and directed to LTF 
synthesis filter supph^ :.and,g2, finally- the effect of the : 

vocal tract is added to the synthesized speech signal by LPC synthesis filtering 310 
prpvidittg; decoded speech signal ss(tt) an. output 

'■i 

: €piisidering «&t WM codebook vector selection, in, ta.4£EI# : itype speech; encoder, . 
•the:;pulse^positions: are deteinfnined : by :mi:m error; between the actual 1 weighted 

\\^v^s^^;m^h synthesized -version- & ' 

: -wher& s p "is pe«eptua%'weig^tod input ^^ee^ll'-is- an -LP .-model .impi|fee: : ro^ 

m&Jri^ e is the selected codbbook vector and vis a 

^so-c$led 6 %<teq>tiye abd^opk ?5 veet^ The minimi^ 

the above error is in practise performed by maximiz^ 

where -T-^y-'S^^ is hereinafter called a "target signal- 9 being equivalent to the 

perceptually weighted input speech signal from which the contribution of the adaptive 
codebook has been removed. k^s.;&e;-iilde^ p£^6d : |f^#bQk.vec|Qi:~c under analysis. 



WO 20O5/O340W) 



PCT/FI2004/000579 



6 

The concept of the adaptive codebpok is illustrated in figure 4 disclosing the GELP 
synthesis model in an alternative manner being quite similar to the common human 
speech production model of figure 1. However, the main difference lies in the 
excitation signal generation part: as seen from figure 4 in GELP coders the selection of 
5 voiced/unvoiced excitation is not usually made , at all and the excitation includes 
adaptive codebpok part 402 and fixed codebook part 404 corresponding to excitatipn 
signals v(n) and c(n) respectively, jyhidi ^ individually weighted g2 r g and then 
summed 40S together to form final excitation u(n) for LPC s^ynthesis^filtw'410./ , Thus : 
the periodicity of the LP residual presented in figures 2 and 3'^tli^:s^rateXTP- filter 
10 connected in series with the LPC synthesis filte* can be alternatively depicted as a 
feedback loop and adaptive cpdebook 402 comprising a delay element controlled by 
lag value -Ti. 



To concretise the goal of the algebraic; fixed, codebook search that is performed after 
15 LPC and LTP analysis stages, an imaginary target si^ial of a single frame that should 
be modeled with an algebraic codebook to a max:imum extent is presented in figure 5, 
Now if two pulses ar^ position for 

them is neait^y peaks 502, 504 in order to imiiipe le energy left in the remaining 
error signal, In this particular example, exactly twq pulses with adjustable sign can be 
20 included in the frame. In a typical encoder, the number of codebook pulses per frame 
and amplitudes thereof is predefine*! al&qugh the overall amplitude of codebook 
vector c(n) can be altered via gain factor g, In addition to mere frames the original 
signal may be divided into a number of subrfr^m^s (e.g. 1-4) as well, which are then 
separately parametexised in relation tp all or some of the squired parameters. For 
25 example, LPC analysis that results LPC coefficients may be executed only once per 
frame thus a single set of LP parameters covers the whole frame whereas codebook 
vectors (fixed ^grebraic and^or adaptive) can be analysed for each sub-frame. 



30 



Gain factor g can be calculated by 

8 ~ clH r Hc k * P} 



Although contemporary methods for modeling and regenerating an applicable 
excitatipn signal for LP synthesis filter seem to provide spmewhat adequate results in 
35 many cases, a number of problems still exist therein. It's obvious that depending on 
the original input signal the prediction error may or may not have serious peaks left in 
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the time domain presentation. The scenario can vary, and thus the fixed number of 
corrective pulses per frame may sometimes be enough to rise the modeling accuracy 
into a moderate level but sometimes not. Occasionally, as with some of the existing 
speech coders, the modeling result may actually get worse by adding unnecessary 
pulses into the excitation signal when the codec specifications do not allow to alter the 
number of pulses in a single frame. On the other hand, if the number of pulses in a 
frame and thus the total output bitrate is varied, the modeling process is surely more 
flexible but also more complex what comes to reception of variable length frames etc. 
Variable output bit-rate may also complicate network planning as transmission 
resources required by a single connection for transferring speech parameters are not 
fixed anymore. 

Figure 8A discloses a target signal in a scenario wherein a frame has been divided into 
four sub-frames. LPC analysis is performed once per frame, and LTP and fixed 
cbdebpok analysis on a sub- frame basis; The target signal comprises severe 
fluctuations 802, 804, 806, 808 in sub-frame 3. However, as algebraic code vectors 
contain only two pulses sharp, they may be placed to cover peaks 802 and 804, but 
peaks 806 and 808 are left intact thus reducing me modeling result. 

Another defect in prior art coders relates to so called closed-loop search of the adaptive 
codebook vector relating to the LTP analysis. 

Usually an open-loop analysis is executed first in order to find a rough estimate of the 
lag T and gain g2 concerning e.g. a whole frame at a time. During open4oop isearch a 
weighted speech signal is just correlated with delayed versions Of itself one at a time in 
order to locate correlation maximas. Considering found occurrences of these 
autocorrelation maximas, the corresponding delay values, in principle especially the 
one producing the highest maximum, then moderately predict the lag term T as the 
correlation maximum often results from the speech signal periodicity. 

Thereafter, in a more accurate, closed-loop adaptive codebook search LTP filter lag T 
and gain g2 values are determined by minimizing the weighted error between the 
original and synthesized speech as in the algrebraic fixed codebook search. This is 
achieved e.g. in the AMR codec on sub-frame basis by maximizing the term: 

m- ff' My '°° . (4) 
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where L is sub-frame length (e.g. 40; .samples). -1, .y(n>^(B)*.h(E) ; and yk is thus the 
past LP synthesis filtered excitation vector) at delay k More 

details about open/closed loop searches especially in the case of A can be 

5 found in reference [3]. However, as it's clear that the actual excitation for the span of 
the current frame is still unknown upon maximising the above term* the current LP 
residual is used as substitute in scenarios with short delay values. See figure 9A for 
clarification. If delay k is short enough, i.e. signal yk requires samples from the current 
sub-frame, any excitation for the current sub-frame is not yet available as the algebraic 
v 10 search is still to be conducted. Therefore, a straightforward solution is to use 4*^^ 
available LP residual (may be initially calculated even to. the whole frame) as a 
substitute for the missing part of the excitation vector corresponding to a time period 
"between legends 902 and 904. On the other hand, a buffor for previous excitation can 
usually be made large enough, three dots emphasize this in &e fi^re, in oi^erto avoid 
15 situations where #lay k is coire^cmdingly too long, and the required excitation is not 
available in the buffer anymore. 

SIJMMAKY OF THE W 

20 l^^:-obj.ect^o£-^ present Mention is to imprpve the excitation signal modeling and 
alleviate the existing defects in contemporary source coding, e.g... speech coding, 
methods. The object is achieved by introducing the ccmcept of time advanced 
exdtation generation, The ^ by, for example, fixed excitation 

codebook is determined in advance to partly cover the next frame or sub-frame as well 

25 in addition to the, current frame. Heh^ the^c advanced'* e;g. half of 

the (si&-^ Wns is achieved without increasing the overall coding 

delay whenever a frame look-ahead is in any case applied in the coding procedure. 
Look-ahead is an additional buffer tot already exists in many state of the art speech 
coders and includes samples from the following frame. The reason why look-ahead 

30 buffer is originally included in the encoders is based on the LP modeling: during the 
LPC analysis of the current frame it has been found advantageous to take the 
forthcoming frame into account as well in order to guarantee smooth enough transition 
between the adjacent frames. 

3 5 The aforesaid procedure offers a clear advantage over the prior art especially when the 
LP residual has occasional peaks embedded. This results from that actually the 

number of pulses in a (sx^b-)frame may be doubled by advancing pulses from a certain 
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frame to the adjacent next frame. Thus the invention entails benefits of the variable- 
rate source coding on frame-by-frame basis but the true bit rate of the encoded signal 
at the output is fixed, and the overall system complexity remains at a relatively low 
level compared to solutions with traditional 
still applicable both to frxed-rate and variable-ratceGders. 

Respectively, as \ the true time advanced excitation can be -used instead of LP resi dual 
during the closed loop search of the adaptive codeboo^rpa^te^ the error signal 
modeling result is improved ■ 

According to the invention, a source coding method enabling at least partial 
subsequent re ^ 
thereof has the steps of 

-^fi^^^^amm^^^^g^x^ consecutive blocks* 

.extracting a Irst set of parameters related |o #id Miter describing; properties of a 
firstblockcQveringa 

-extracting a second ^ set ;:of parameters related to said excitation signal for said 
filter, where said second set of parameters M determined fern and describing 
properties of both the first block and a second block follQwing the first block 
within a second time p 
outside said first time period. 

S Mother aspect of meanveiition, a method for decoding; encoded data signal divided 

into consecutive blocks has the steps of 

-obtaming a ^ a synthesis filter, said first set 

of^arameters describing properties of a firstbloefc covering a first time period, 
-obtaining a second set of parameters for constructing an excitation signal 'for 
•said synthesis filter, said second set of parameters describing properties of both 
the first block and a second block following the first blbck wmiih a second time 
period starting later than ^-^^^^^ extending outside said first 
timeperiodV; 

-obtaining at least parfcof a previous Second; sefpf parameters for constracting an 
excitation signal for said synthesis filter, said previous second set of parameters 
describing properties of said first block during at least the time period between 
the beginning of said first time period and the beginning of said second time 
Period, ., 
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-combining the contribution of said previous second set of parameters and said 
second set of parameters for said excitation signal within the first time period, 
-constructing an excitation signal of said first block for said synthesis filter by 
utilizing said combination, and 
5 -filtering said constructed excitation signal through said synthesis filter. 

In a further aspect of the invention, an electronic device for encoding source data 
divided into consecutive blocks to be represented by at least a first md a of 
parameter, comprises processing means an memory means for processing and spring 

10 instructions and data, and data transfer means for accessing data, and the device is 
arranged to determine said second set of parameters describing properties of both a 
first block covering a jfirst time period, properties of said first block described by said 
first set of parameters, and a second block following tlie first block within a second 
time period starting later than said first time period and extending outside said first 

15 timeperiod. 

In a further aspect of the invention, an electronic device for decoding source data 
divided into consecutive blocks, comprises processing means ^ means for 

processing and storing instructions and data, and data transfer means for accessing 
data, and the device is arranged to obtain 

20 a first set of parameters for constmcting a synthesis filter, said 1 first set of parameters 
describing properties of a^fi^tb]Qck : )^y6qng a first time period, 

a second set of parameters for constructing an excitation signal for said s^thesi^fi 
said second set of parpneters <fescrib^ 

block following the first block within a second time period star^g later &an said first 
25 time period and extend 

at least part of a previous second set of parameter for cpnstriictmg an excitation signal 
for said synthesis filt^ said previous second s#pf parameters describing properties of 
said first block during at least the time period between the beguming of said first time 
period and the beginning of said second time period, 

30 said device further arranged to combine the contribution of said previous second set of 
parameters and said second set of parameters, for said excitation signal within said first 
time period, 

to construct an excitation signal of said first block for said synthesis fitter by utilizing 
said combination, and 
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to filter said constructed excitation signal through said synthesis filter. 

In a further aspect of die invention, a computer program for encoding source- data 
divided into consecutive blocks to be represented by at least a first and a second set of 
parameters, comprises code means to determine said second set of parameters 
describing properties of both a first block covering a first time period, properties of 
said first block described by said first set pf parameters, and a second block following 
the firsf block within a second time period starting later than said first time period and 
extending outside said first tinie period. 

Still in a further aspect of the invention, a computer program for decoding source data 
represented by at least a first and a second set of parameters, where said first set of 
parameters relate to a synthesis filter and said second set of parameters to an excitation 
signal for said filter, said data divided into consecutive blocks, said first set of 
parameters describing properties of a first block covering a first time period and said 
second set of parameters describing properties of both the first block and a second 
15 block following the first block within a second time period starting later than said first 
time period and extending outside said fi 

by utilizing at least part of a previous second set of parameters for constructing an 
excitation signal for said synthesis filter, said previous second set of parameters 
describing properties of said: first block during at least the time period between the 
20 beginning of said first time period and the beginnmg of said second time period, 

to combine the contribution of said previous second set of parameters and said second 
•setjOf panm^ters, forsaid exdtati<m-signai^wifljih mdlfi^' Wijjerio^' ' 

to construct an excitation signal of said first block for said synthesis filter by utilizing 
said combinatipn, and 

25 to filter said constructed excitation signal through said synthesis filter. 

The term "set" refers generally to a collection! of one or more elements, e.g. 
parameters. 

In an embodiment of the invention, the proposed metliod for excitation generation is 
utilized in a CELP type speech coder, A speech frame is divided into sub-frames that 
30 are analysed first as a whole, then one at a time. In order to determine an advanced 
excitation signal, the target signal and the fixed codebook are shifted for example half 
a sub-frame forward during the analysis stage. 



WO 2U05/034090 



PC17FI2004/U00579 



12 

Accompanying dependent claims disclose 

BRIEF DESCRIPTION OF THE DRAWINGS 

5 hereinafter the mventioh is described in more detail by TCjl^rence to the attached 
drawing wherein 

;Mg,; 1 <$&(^ 

Fig. 2 ilkstrates a blockdidgm 
10 Fig. 3 illustrates a feipck diagram of a typical GEEP ;^eech 
Fig: 4 <iep lets a CgtP .sy^egis model for speech generation. 

Fig. 5 discloses a typical scenario in a CELP type speech encoding where the target 
signal is modeled with a f&edii^^ vector, 
Eig: 6 illu#ra^ 
15 Mg. 7 illustrates a hioc^ diagram fef a 

Fig. SA ilhisftatgs target signal /^o^j^^'^i^ lixed twp; pulsus per $ju&-^ 
conventional speech codec.; 

-Fig.. SB illustrates target signal modeling with a maximum of four pulses per subr 

frame^m^^ 

20 Eig; 9A illustrates a scenario wherein : .1EP : :i^i^al- ' lias .be 'tiised as a substitute for 
teue exdte^ codecs. 
Fig: 9Ml!u^ 
foillier use in a closed- 
Fig; 10 <Hscloses7 a flow ^a^^ : :of?.i%e7m^dd of ^eiinvention for encoding a data 

25 signal. 

Fig. 11 discloses a flow diagram of the method of the invention for decoding an 
encoded data signal 

Fig. 12 $sci^ diagram of a^deyice accorcto^ 

30 

pET^ OF THE IN^E^^ 

Figures 1-5, SA, and 9 A were already discussed m poiyunptipn w 
related prior art. 

35 

Figure 6 discloses, by way of example only, a block diag^ pf a CELP encoder 
utilizing the proposed technique of time advancing; the excitation signal. LPG analysis 
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is performed once per frame, and LIT analysis and excitation search for every sub- 
frame in a frame comprising four sub-frames. The codec also includes a look-ahead 
buffer for input speech. 

5 Encoding process of the invention comprises similar general steps as the prior art 
methods. LPC analysis 604 provides LP parameters, and LPT analysis 602 results lag 
aild * m & terms - °P*awfl excitation search loop comprises codebook 606 
multiplier 616, LTP/adaptive codebook and LPC synthesis filters 608, 610 adder 61 s' 
weighting filter 6T2 and search; logic 614. In addition, memory 622 for storing the 
10 selected excitation vector or indication thereof for a certain sub-frame and combine 
logic 620 to join the last half of previously selected and stored excitation vector wh ich 
was calculated during analysis of previous sub-frame but targeted for the first half of 
the current sub-frame, and the frrst part; of the currently selected excitation vector for 
gam determination as described later are -mcluded 

15 

The first difference between prior art .solutions and the one of the invention occurs in 
connection with the calculation of the target^signM for itfre excitation codebook search 
If the excitation codebook isshifted for example half of a sub-frame ahead the latter 
half of the codebook resides in the next sub-frame. Considering the last sub-frame in a 
0 frame, the look-ahead buffer may be correspondingly exploited. In addition the 
amourit of shifting can be varied on foe basis of a separate (e.g. manually controlled) 
shift control parameter or of the characteristics of foe input data, for example The 
parameter may be received from an external entity, e.g. from a network entity such as a 
radio network controller m foe c^ data may be 

statistically analysed and, if seenriecessary (e.g. occasional peak formations found in 
foe target signal), the shifting; can be dynamically mtroduced to the coding process or 
the easting shifting may be altered. Then the selected shift parameter ^alue can be 
transmitted to the receiving end (to be used by the decoder) either separately or as 
embedded m the speech frames or s^^ 
frame or upon changein the parameter value; 

In figure SB, a portion of a target signal (effectively a speech signal from which foe 
effect of adaptive codebook is removed as described hereinbefore) divided into a frame 
offour sub-frames and a look-ahead buffer are disclosed. ^ code 
vector is determined by minimizing foe error 
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where s kdv is the new adyai^^ 
frame-s target ait&fi 

in figure; SB; target (sub-)frame windows are shifted S1Q half a sub-frame ahead in 
5 time in mlatiqn to> the co^sjp^n^njg ^-fem^s. In this ^xanipfej the look>ahead 
buffer equals to half a size of a sub-frame to enabling) 
the possible tim^ shibft between to the same amount^ i.e. 

time shift oecuis; betwgeii 0 and JJ2 % . Wiere L is the length of a. siffi-fiBme. ; ,A:s: ia 
generalization ? shift shall be defined as equal or less to the length of the lo<^ 
10 [h^^'if%'00f0 : target '.signal should ^ways he c^ the input signal to 

.existing in the buffer, Note that memory 622 is not utilised in calculating the excitation 
sector. 

Optionally^ if also impulse response matrix H has been calculated on sub-frame basis, a 
15 time shift equivalent to one of the target; signal may be introduced to -it -for "minimizing 
■the -defined: by^ipiation;.^* Coi^e^ondingl^ if none pf the speech/parametenaas; 
actually modeled 6n : 'a-su^ basis and only fi^mes :ai^^an^ysed;as such, it makes: 
no substantial difference to%e appIieaW 

20 deferring; to elation 2 ? the pulse posMons for an advanced excitation Vector :are; 
.fialculated .respectively also in this case but witb time advanced target and optionally 
with similarly advanced impulse response matrix. Possible advancing of gain factor 
■gfa is more or less mere academic issue, as the gain factor is not needed in 
solution ino^ the optimal excitation, 



.25 



30 



Meaoowhile^ eodehook gain g for the excitation vector fe-^ci|bte4 mthfe ba$is; ! -of |lie 
actual sifo^frameM follows 



where c c is a joint excitation vector 

c^Udf (3). 

3.5- consisting ^q^^^^ 'fef~%r/2M*£<W$ -c^c t (l)^ / = 1..X where c $ 

corresponds to tlie excitation vector calculated in the i:th sub^frame and L is the length 
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of the sub-frame and the excitation vector. Contents of memory 622 are this time 
needed in the procedure in order to provide latter half of previous sub-frame to the 
joint vector. 

5 As the excitation vectors are just shifted during analysis and synthesis stages in 
encoder/decoder, their internal structure remains intact; the coding of pulse locations 
can be kept original and the structure of parameterised frames transferred over the 
ttansmission channel is not changed. Thus also data handling like different parameter 
insertioii/extraction routines needed in the encoder/decoder do not require 
10 modifications in a traditional coder to be converted into conformity with the proposed 
solution. 

And what comes to the LTP analysis and an adaptive codebook closed-loop search 
thereof in the advanced excitation CELP codec, the situation is depicted in figure 9B. 
15 Differmg trom the prior art solutions, past excitation available extends to a point 910 at 
the border of the time advanced target signal for the last-sub-frame of the previous 
frame and the first time advanced target signal of the current frame. Hence, the LTP 
analysis is improved as the true excitation can be at least partly utilized instead of mere 
LP residual durmg the closed-loop search. The same analogy applies to the following 
sub-frames or a scenario wherein sub-frames are not used at all and modeling takes 
place in frame units only. 



10 



A block diagram of the decoder of the invention is disclosed in figure 7. The decoder 
receives the excitation codebook index u, excitation gain g, iW^f^^^X-igl '-Qt- 
5 present), and LP parameters a(i). First the decoder resolves the excitation vector from 
codebook 706 by utt^^^ 

sub^frame vector (memory) 716 as explainer earlier. The latter half of previous vector 
is attached to the first half of the current vector in block 714 after which the original 
current vector or at least the latter half thereof (or indication thereof) is stored in 
0 memory 716 for future use. The created joint vector is then multiplied 712 by gain g, 
and filtered through LTP synthesis 70S and LPC synthesis 710 filters in order to 
produce a synthesized Speech signal ss(n) in the output 

A flow diagram of the encoding method is disclosed in figure 10. Respectively, the 
5 decoding flow diagram is depicted in figure 11. The flow diagrams are constructed to 
further facilitate the understanding of encoder internals although the same basic 
principles can already be found in the block diagrams of figures 6 and 7. Step 1002 
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corresponds to method start-up where e.g. filter memories and parameters are 
initialised. In step 1004 the source signal is, if not already, divided into blocks to be 
parameterized.. Blocks may, for example, be equiv^ent to frames or sub-frames of the 
aforepresented embodiment. Although the flow diagrams in figures 10 and 11 handle 
5 the source data on a single level of block hierarchy, the solutions corresponding to the 
actual embodiment where source data was first divided info top-level blocks like 
frames and tken to the snb-blocks (such a^ Part of the 

overall analysts may be thus executed on higher level and rest on lower level, like 
frame level LPC dialysis and sub-frame level excitation vector analysis in the 

10 disclosed embodiment/ Therefore, it's not /crucial to the invention what type of 
hierarchy is used, or on what levels certain parameters are analysed as long as the 
excitation signal analysis exploits time advancing in relation to the actual block 
division of 'that level In stgp 1<>D6 a new block is selected for encoding and LJPG 
analysis is perfbiiBed resulting m set of UP parameters; Such parameters can be 

15 transferred i& the; recipient a^ mch pr in a coded form (as; line spectral pairs, for 
example), a table index or utilizing whatever suitable indipatipn. The following step 
includes LTP analysis 1008 ou^ parameters for the closed-loop 

LTP/adaptive codebook: parameter search. &s described hereinbefore, a timQ advanced 
target signal for excitation search is defined in step 1010. In analysis-by-synthesis type 

20 excitation search loop an excitation vector is selected 1012 from the excitation 
codebook and used in synth^i^ speech 1014. Procedure is repeated until the 
maximum count for a number of iteration rounds is reached or the predefined error- 
criteria is met 1016. The e^ifca^n vector producing the smallest error is normally the 
one to be selected. Tlie sheeted vector (or plher indication ttiereof such as a codebook 

25 index) or at least the part thereof corresponding to the next block, is also stored for 
further 'use. The excitation gain is calculated in step 1018. The overall encoding 
process is continued from step 1006 #any unprocessed blocks left 1020, otherwise the 
method is ended in phase 1022. 

30 In step 1 102 the decoding process is rsimped up wifh necessary ittiti^isatipiis etc. 
Encoded data is received 1104 in blocks that are, for example, buffered for later 
decoding. The current ^cit^on veptqr for the block under reconstruction is 
determined by utilizing the received data in istqp 11 06, ^ch m^ mean r example, 
retrieving a certain code vector from a codebook on the basis of received codebook 

35 index. In step 1 108 the previous excitation vector (or in practise the required part, e.g. 
last half, thereof) or indication thereof is retrieved from the memory and attached to 
the relevant first part of the current vector in phase 1110. Then the current vector (or 
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the more relevant latter part of it) is stored 1112 in the memory (as an index, true 
vector or other possible derivative/indication) to be used in connection with the 
decoding of the next "block. The joint vector is. multiplied by excitation gain in phase 
11 14 and finally filtered through LTP synthesis 1116 and LPC synthesis 1118 filters 
LTP and LP parameters may have been received as such or as coded (indications like 
table index, or in a line spectral pair form etc). If there are no blocks left to be decoded 
1120, the method execution is .redirected to step 1106. Otherwise the method is ended 
1122. In many cases, step ordering presented in the diagrams may not be.an essential 
issue; for example, the execution order of phases 1106 and 1108, and 1110 and 112 
can be reversed if needed purposeful. 

Figure 12 depicts one option for basic components of a device like a communications 
device (e.g. a mobile terminal), a data storage device, an audio recorder/playback 
device, a network element (e.g. a base station, a gateway, an exchange or a module 
thereof), or a computer capable of processing, storing, and accessing data m 
accordance with; the. invention.. Memory 1204, divided between one or more physical 
chips, comprises necessary code 1216, e.g. in a form of a computer 
program/application, and data 1212; a necessary input for the proposed method 
producing an encoded (or respectively decoded) version 1214 as an output. A 
processing unit 1202, e.g. microprocessor, a DSP (digital signal processor), a 
microcontroller, , or a programmable logic, is required for the actual execution of the 
method including die encoding and/cr decoding; of data 1212 in accordance with 
instructions 1216 stored in memory 1204. Display 1206 and keypad 1210 are in 
principle optional components but still often needed for providing necessary device 
control and data visualization means (-user interface) to the user. Data transfer 'means 
1208, e:g. a CD/floppy/hard drive or a network adapter, are required for handling data 
exchange, for example acquiring source data and outputnng processed data, with other 
devices. Data transfer means 1208 may also indicate audio parte like transducers (A/D 
and D/A converters, microphone, loudspeaker, amplifiers etc) that are used to input the 
audio signal for processing and/or output the decoded signal. This scenario is 
applicable, for example, in the case of mobile terminals and various audio storage 
and/or playback devices sucb as audio recorders and dictating machines utilizing the 
method of die invention. The code 1216 for the execution of the proposed method can 
be stored and delivered on a carrier medium like a floppy, a CD or a memory card. 
Furthermore, a device performing the data encoding and/or decoding according to the 
invention may be implemented as a module (e.g. a codec chip or circuit arrangement) 
included in or just connected to some other device- Then the module does not have to 
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contain all the necessary code means for completing the overall task of encoding or 
decoding. The module may, for example, receive at least some of the filter parameters 
like LP or LPT parameters from an external entity in addition to the unencoded or 
encoded data and determine/construct just the excitation signal by itself. 

5 

The scope of* the invention can be found in the following claims. However, utilized 
devices, method steps, data structures etc may vary significantly depending on the 
current scenario, still converging td the basic ideas of this invention, for example, it is 
clear that the size reduction aspect of source ^ 

10 though, condition for utilizing the proposed method; it can be used just for. 
representing and analysing the source data with a number of parameters. In addition to 
data transfer solutions the invention may be applied in a single device only for data 
Storage purposes. Furthermore, any kind of source data can be used in the method, not 
just speech. However^ with data caiTying speech characteristics, Le. data for which the 

15 source-filter approach fits well, the modeling results are presumably most accurate. 
Still %iher, #e-mventemay-.b.e;ased in any land of device capable of executing* 
necessary processing steps; the applicable device arid component types are thus not 
strictly limited tp the ones listed hereinbefore. 

20 
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